=============== <Original Dataset> =============== <class 'pandas.core.frame.DataFrame'> RangeIndex: 20640 entries, 0 to 20639 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 longitude 20640 non-null float64 1 latitude 20640 non-null float64 2 housing_median_age 20640 non-null float64 3 total_rooms 20640 non-null float64 4 total_bedrooms 20433 non-null float64 5 population 20640 non-null float64 6 households 20640 non-null float64 7 median_income 20640 non-null float64 8 median_house_value 20640 non-null float64 9 ocean_proximity 20640 non-null object dtypes: float64(9), object(1) memory usage: 1.6+ MB None
| longitude | latitude | housing_median_age | total_rooms | total_bedrooms | population | households | median_income | median_house_value | ocean_proximity | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | -122.23 | 37.88 | 41.0 | 880.0 | 129.0 | 322.0 | 126.0 | 8.3252 | 452600.0 | NEAR BAY |
| 1 | -122.22 | 37.86 | 21.0 | 7099.0 | 1106.0 | 2401.0 | 1138.0 | 8.3014 | 358500.0 | NEAR BAY |
| 2 | -122.24 | 37.85 | 52.0 | 1467.0 | 190.0 | 496.0 | 177.0 | 7.2574 | 352100.0 | NEAR BAY |
| 3 | -122.25 | 37.85 | 52.0 | 1274.0 | 235.0 | 558.0 | 219.0 | 5.6431 | 341300.0 | NEAR BAY |
| 4 | -122.25 | 37.85 | 52.0 | 1627.0 | 280.0 | 565.0 | 259.0 | 3.8462 | 342200.0 | NEAR BAY |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 20635 | -121.09 | 39.48 | 25.0 | 1665.0 | 374.0 | 845.0 | 330.0 | 1.5603 | 78100.0 | INLAND |
| 20636 | -121.21 | 39.49 | 18.0 | 697.0 | 150.0 | 356.0 | 114.0 | 2.5568 | 77100.0 | INLAND |
| 20637 | -121.22 | 39.43 | 17.0 | 2254.0 | 485.0 | 1007.0 | 433.0 | 1.7000 | 92300.0 | INLAND |
| 20638 | -121.32 | 39.43 | 18.0 | 1860.0 | 409.0 | 741.0 | 349.0 | 1.8672 | 84700.0 | INLAND |
| 20639 | -121.24 | 39.37 | 16.0 | 2785.0 | 616.0 | 1387.0 | 530.0 | 2.3886 | 89400.0 | INLAND |
20640 rows × 10 columns
=============== <Modified Dataset> =============== <class 'pandas.core.frame.DataFrame'> RangeIndex: 20433 entries, 0 to 20432 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 longitude 20433 non-null float64 1 latitude 20433 non-null float64 2 housing_median_age 20433 non-null float64 3 total_rooms 20433 non-null float64 4 total_bedrooms 20433 non-null float64 5 population 20433 non-null float64 6 households 20433 non-null float64 7 median_income 20433 non-null float64 8 ocean_proximity 20433 non-null object dtypes: float64(8), object(1) memory usage: 1.4+ MB None
| longitude | latitude | housing_median_age | total_rooms | total_bedrooms | population | households | median_income | ocean_proximity | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | -122.23 | 37.88 | 41.0 | 880.0 | 129.0 | 322.0 | 126.0 | 8.3252 | NEAR BAY |
| 1 | -122.22 | 37.86 | 21.0 | 7099.0 | 1106.0 | 2401.0 | 1138.0 | 8.3014 | NEAR BAY |
| 2 | -122.24 | 37.85 | 52.0 | 1467.0 | 190.0 | 496.0 | 177.0 | 7.2574 | NEAR BAY |
| 3 | -122.25 | 37.85 | 52.0 | 1274.0 | 235.0 | 558.0 | 219.0 | 5.6431 | NEAR BAY |
| 4 | -122.25 | 37.85 | 52.0 | 1627.0 | 280.0 | 565.0 | 259.0 | 3.8462 | NEAR BAY |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 20428 | -121.09 | 39.48 | 25.0 | 1665.0 | 374.0 | 845.0 | 330.0 | 1.5603 | INLAND |
| 20429 | -121.21 | 39.49 | 18.0 | 697.0 | 150.0 | 356.0 | 114.0 | 2.5568 | INLAND |
| 20430 | -121.22 | 39.43 | 17.0 | 2254.0 | 485.0 | 1007.0 | 433.0 | 1.7000 | INLAND |
| 20431 | -121.32 | 39.43 | 18.0 | 1860.0 | 409.0 | 741.0 | 349.0 | 1.8672 | INLAND |
| 20432 | -121.24 | 39.37 | 16.0 | 2785.0 | 616.0 | 1387.0 | 530.0 | 2.3886 | INLAND |
20433 rows × 9 columns
=============== AutoML Start =============== =============== Model : GMM =============== Start calculating silhouette_score...( method = GMM )
best K_s = [2, 3] covariance_type = full / init_params = kmeans / k = 2 Done.
<Figure size 432x288 with 0 Axes>
========== Compare with original labels ========== ===count=== predict 0.0 16735 1.0 3698 Name: median_house_value, dtype: int64 ===max=== predict 0.0 500001.0 1.0 500001.0 Name: median_house_value, dtype: float64 ===median=== predict 0.0 179500.0 1.0 181800.0 Name: median_house_value, dtype: float64 ===min=== predict 0.0 14999.0 1.0 22500.0 Name: median_house_value, dtype: float64 ===mean=== predict 0.0 206703.069913 1.0 210049.914548 Name: median_house_value, dtype: float64 covariance_type = full / init_params = random / k = 2 Done.
<Figure size 432x288 with 0 Axes>
========== Compare with original labels ========== ===count=== predict 0.0 3783 1.0 16650 Name: median_house_value, dtype: int64 ===max=== predict 0.0 500001.0 1.0 500001.0 Name: median_house_value, dtype: float64 ===median=== predict 0.0 182900.0 1.0 179300.0 Name: median_house_value, dtype: float64 ===min=== predict 0.0 22500.0 1.0 14999.0 Name: median_house_value, dtype: float64 ===mean=== predict 0.0 210182.047846 1.0 206655.962282 Name: median_house_value, dtype: float64 covariance_type = tied / init_params = kmeans / k = 2 Done.
<Figure size 432x288 with 0 Axes>
========== Compare with original labels ========== ===count=== predict 0.0 20269 1.0 164 Name: median_house_value, dtype: int64 ===max=== predict 0.0 500001.0 1.0 500001.0 Name: median_house_value, dtype: float64 ===median=== predict 0.0 180600.0 1.0 119050.0 Name: median_house_value, dtype: float64 ===min=== predict 0.0 14999.0 1.0 48300.0 Name: median_house_value, dtype: float64 ===mean=== predict 0.0 207814.374562 1.0 144822.567073 Name: median_house_value, dtype: float64 covariance_type = tied / init_params = random / k = 2 Done.
<Figure size 432x288 with 0 Axes>
========== Compare with original labels ========== ===count=== predict 0.0 8983 1.0 11450 Name: median_house_value, dtype: int64 ===max=== predict 0.0 500001.0 1.0 500001.0 Name: median_house_value, dtype: float64 ===median=== predict 0.0 170500.0 1.0 189100.0 Name: median_house_value, dtype: float64 ===min=== predict 0.0 14999.0 1.0 14999.0 Name: median_house_value, dtype: float64 ===mean=== predict 0.0 197207.962819 1.0 215233.303843 Name: median_house_value, dtype: float64 covariance_type = diag / init_params = kmeans / k = 2 Done.
<Figure size 432x288 with 0 Axes>
========== Compare with original labels ========== ===count=== predict 0.0 15792 1.0 4641 Name: median_house_value, dtype: int64 ===max=== predict 0.0 500001.0 1.0 500001.0 Name: median_house_value, dtype: float64 ===median=== predict 0.0 178300.0 1.0 185400.0 Name: median_house_value, dtype: float64 ===min=== predict 0.0 14999.0 1.0 14999.0 Name: median_house_value, dtype: float64 ===mean=== predict 0.0 206122.450355 1.0 211345.555484 Name: median_house_value, dtype: float64 covariance_type = diag / init_params = random / k = 2 Done.
<Figure size 432x288 with 0 Axes>
========== Compare with original labels ========== ===count=== predict 0.0 4632 1.0 15801 Name: median_house_value, dtype: int64 ===max=== predict 0.0 500001.0 1.0 500001.0 Name: median_house_value, dtype: float64 ===median=== predict 0.0 185400.0 1.0 178300.0 Name: median_house_value, dtype: float64 ===min=== predict 0.0 14999.0 1.0 14999.0 Name: median_house_value, dtype: float64 ===mean=== predict 0.0 211398.817573 1.0 206109.811784 Name: median_house_value, dtype: float64 covariance_type = full / init_params = kmeans / k = 3 Done.
<Figure size 432x288 with 0 Axes>
========== Compare with original labels ========== ===count=== predict 0.0 10396 1.0 3109 2.0 6928 Name: median_house_value, dtype: int64 ===max=== predict 0.0 500001.0 1.0 500001.0 2.0 500001.0 Name: median_house_value, dtype: float64 ===median=== predict 0.0 162700.0 1.0 177100.0 2.0 195400.0 Name: median_house_value, dtype: float64 ===min=== predict 0.0 14999.0 1.0 22500.0 2.0 22500.0 Name: median_house_value, dtype: float64 ===mean=== predict 0.0 192951.768276 1.0 205003.323898 2.0 229887.202945 Name: median_house_value, dtype: float64 covariance_type = full / init_params = random / k = 3 Done.
<Figure size 432x288 with 0 Axes>
========== Compare with original labels ========== ===count=== predict 0.0 10937 1.0 2832 2.0 6664 Name: median_house_value, dtype: int64 ===max=== predict 0.0 500001.0 1.0 500001.0 2.0 500001.0 Name: median_house_value, dtype: float64 ===median=== predict 0.0 167700.0 1.0 154300.0 2.0 204250.0 Name: median_house_value, dtype: float64 ===min=== predict 0.0 14999.0 1.0 22500.0 2.0 14999.0 Name: median_house_value, dtype: float64 ===mean=== predict 0.0 196782.568803 1.0 188129.555085 2.0 232735.084634 Name: median_house_value, dtype: float64 covariance_type = tied / init_params = kmeans / k = 3 Done.
<Figure size 432x288 with 0 Axes>
========== Compare with original labels ========== ===count=== predict 0.0 8599 1.0 155 2.0 11679 Name: median_house_value, dtype: int64 ===max=== predict 0.0 500001.0 1.0 500001.0 2.0 500001.0 Name: median_house_value, dtype: float64 ===median=== predict 0.0 175000.0 1.0 116300.0 2.0 183100.0 Name: median_house_value, dtype: float64 ===min=== predict 0.0 14999.0 1.0 48300.0 2.0 14999.0 Name: median_house_value, dtype: float64 ===mean=== predict 0.0 200490.849401 1.0 140995.490323 2.0 213208.780204 Name: median_house_value, dtype: float64 covariance_type = tied / init_params = random / k = 3 Done.
<Figure size 432x288 with 0 Axes>
========== Compare with original labels ========== ===count=== predict 0.0 9284 1.0 5805 2.0 5344 Name: median_house_value, dtype: int64 ===max=== predict 0.0 500001.0 1.0 500001.0 2.0 500001.0 Name: median_house_value, dtype: float64 ===median=== predict 0.0 192200.0 1.0 139700.0 2.0 193500.0 Name: median_house_value, dtype: float64 ===min=== predict 0.0 14999.0 1.0 14999.0 2.0 22500.0 Name: median_house_value, dtype: float64 ===mean=== predict 0.0 221758.488798 1.0 173236.156589 2.0 219217.582335 Name: median_house_value, dtype: float64 covariance_type = diag / init_params = kmeans / k = 3 Done.
<Figure size 432x288 with 0 Axes>
========== Compare with original labels ========== ===count=== predict 0.0 7709 1.0 3386 2.0 9338 Name: median_house_value, dtype: int64 ===max=== predict 0.0 500001.0 1.0 500001.0 2.0 500001.0 Name: median_house_value, dtype: float64 ===median=== predict 0.0 169300.0 1.0 181300.0 2.0 184400.0 Name: median_house_value, dtype: float64 ===min=== predict 0.0 14999.0 1.0 22500.0 2.0 14999.0 Name: median_house_value, dtype: float64 ===mean=== predict 0.0 198071.215333 1.0 207559.938866 2.0 214843.810987 Name: median_house_value, dtype: float64 covariance_type = diag / init_params = random / k = 3 Done.
<Figure size 432x288 with 0 Axes>
========== Compare with original labels ========== ===count=== predict 0.0 7713 1.0 10109 2.0 2611 Name: median_house_value, dtype: int64 ===max=== predict 0.0 500001.0 1.0 500001.0 2.0 500001.0 Name: median_house_value, dtype: float64 ===median=== predict 0.0 182700.0 1.0 175800.0 2.0 187200.0 Name: median_house_value, dtype: float64 ===min=== predict 0.0 14999.0 1.0 14999.0 2.0 22500.0 Name: median_house_value, dtype: float64 ===mean=== predict 0.0 210227.068456 1.0 203538.031160 2.0 213287.293374 Name: median_house_value, dtype: float64